81 research outputs found

    Generalization Through the Lens of Learning Dynamics

    Full text link
    A machine learning (ML) system must learn not only to match the output of a target function on a training set, but also to generalize to novel situations in order to yield accurate predictions at deployment. In most practical applications, the user cannot exhaustively enumerate every possible input to the model; strong generalization performance is therefore crucial to the development of ML systems which are performant and reliable enough to be deployed in the real world. While generalization is well-understood theoretically in a number of hypothesis classes, the impressive generalization performance of deep neural networks has stymied theoreticians. In deep reinforcement learning (RL), our understanding of generalization is further complicated by the conflict between generalization and stability in widely-used RL algorithms. This thesis will provide insight into generalization by studying the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.Comment: PhD Thesi

    Generalization through the lens of learning dynamics

    Get PDF
    A machine learning (ML) system must learn not only to match the output of a target function on a training set, but also to generalize to novel situations in order to yield accurate predictions at deployment. In most practical applications, the user cannot exhaustively enumerate every possible input to the model; strong generalization performance is therefore crucial to the development of ML systems which are performant and reliable enough to be deployed in the real world. While generalization is well-understood theoretically in a number of hypothesis classes, the impressive generalization performance of deep neural networks has stymied theoreticians. In deep reinforcement learning (RL), our understanding of generalization is further complicated by the conflict between generalization and stability in widely-used RL algorithms. This thesis will provide insight into generalization by studying the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks. We begin with a study of generalization in supervised learning. We propose new PAC-Bayes generalization bounds for invariant models and for models trained with data augmentation. We go on to consider more general forms of inductive bias, connecting a notion of training speed with Bayesian model selection. This connection yields a family of marginal likelihood estimators which require only sampled losses from an iterative gradient descent trajectory, and analogous performance estimators for neural networks. We then turn our attention to reinforcement learning, laying out the learning dynamics framework for the RL setting which will be leveraged throughout the remainder of the thesis. We identify a new phenomenon which we term capacity loss, whereby neural networks lose their ability to adapt to new target functions over the course of training in deep RL problems, for which we propose a novel regularization approach. Follow-up analysis studying more subtle forms of capacity loss reveals that deep RL agents are prone to memorization due to the unstructured form of early prediction targets, and highlights a solution in the form of distillation. We conclude by calling back to a different notion of invariance to that which started this thesis, presenting a novel representation learning method which promotes invariance to spurious factors of variation in the environment

    A Comparative Analysis of Expected and Distributional Reinforcement Learning

    Full text link
    Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.Comment: To appear in the Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligenc

    Managing Networks for School Improvement: Seven Lessons from the Field

    Get PDF
    In recent decades, new networks for school improvement (NSI) have proliferated across the country. These emerging organizational structures present education leaders with an opportunity to build dynamic infrastructures to engage schools in improvements to teaching and learning. NSI are diverse. Some NSI are part of school districts, while others are contracted by school districts to design blueprints for school improvement. What all NSI have in common is a central hub supporting a set of member schools, like the center of a wheel and its spokes. In this guidebook, we focus on common lessons for designing improvement infrastructures from the perspective of leaders across four different types of networks, including: Local district superintendents who support schools in a particular geographic area; Field support centers, which partner with district superintendents in the intermediary space between the central office and schools; Affinity organizations, which are independent non-profit organizations that work under contract from the central district office to support a select group of district schools; and Charter school management organizations that operate outside the district, supporting their affiliated member schools. Our aim was to better understand how NSI were responding to the increased demands of recent shifts to more rigorous college- and career-ready standards. These seven lessons emerged from interviews with central office administrators overseeing NSI and staff working in network hubs, as well as from observations of professional learning (PL) sessions provided by hubs. We hope these lessons are useful to your work improving teaching and learning in your school, network, or district

    The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation

    Full text link
    We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reach the surprising conclusion that even if a practitioner has no interest in the return distribution beyond the mean, QTD (which learns predictions about the full distribution of returns) may offer performance superior to approaches such as classical TD learning, which predict only the mean return, even in the tabular setting.Comment: ICML 202

    Understanding plasticity in neural networks

    Full text link
    Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness of deep reinforcement learning systems. Deep neural networks are known to lose plasticity over the course of training even in relatively simple learning problems, but the mechanisms driving this phenomenon are still poorly understood. This paper conducts a systematic empirical analysis into plasticity loss, with the goal of understanding the phenomenon mechanistically in order to guide the future development of targeted solutions. We find that loss of plasticity is deeply connected to changes in the curvature of the loss landscape, but that it often occurs in the absence of saturated units. Based on this insight, we identify a number of parameterization and optimization design choices which enable networks to better preserve plasticity over the course of training. We validate the utility of these findings on larger-scale RL benchmarks in the Arcade Learning Environment.Comment: Accepted to ICML 2023 (oral presentation
    • …
    corecore